Improving Quality of Search Results Clustering with Approximate Matrix Factorisations

نویسنده

  • Stanislaw Osinski
چکیده

In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering algorithms: Suffix Tree Clustering (STC) and Tolerance Rough Set Clustering (TRC). For our experiments we use the standard merge-thencluster approach based on the Open Directory Project web catalogue as a source of human-clustered document summaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Free Line Search Steepest Descent Method for Solving Unconstrained Optimization Problems

In this paper, we solve unconstrained optimization problem using a free line search steepest descent method. First, we propose a double parameter scaled quasi Newton formula for calculating an approximation of the Hessian matrix. The approximation obtained from this formula is a positive definite matrix that is satisfied in the standard secant relation. We also show that the largest eigen value...

متن کامل

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters&#10 &#10In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy cluster...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering ...

متن کامل

Carrot2 and Language Properties in Web Search Results Clustering

This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and development of search results clustering algorithms – Carrot. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006